K-means Clustering
K-means clustering is one of the most popular and widely used unsupervised machine learning algorithms for partitioning a dataset into K distinct, non-overlapping clusters.
How K-means Works
The K-means algorithm works by:
- Initialization: Randomly select K points as initial cluster centroids
- Assignment: Assign each data point to the nearest centroid
- Update: Recalculate the centroids as the mean of all points in the cluster
- Repeat: Iterate assignment and update steps until convergence (centroids no longer move significantly)
Mathematical Formulation
K-means aims to minimize the within-cluster sum of squares (WCSS), also known as inertia:
Where:
- is the set of points in cluster
- is the centroid of cluster